Imagine Home | Teachers' Corner | Lesson Plans |

Enrichment - The Chi-Square Test

Astronomers commonly combine the chi

(chi-square) test with epoch folding in order to investigate possible periodic behavior in the light curve of a source. You can understand why this procedure is required by examining the light curve below. Ask yourself, "How could we possibly even take a guess at the length of the period?" Three hundred thousand points! It won't fit in your calculator and who wants to do such a calculation by hand? It is time to bring in a mathem atical procedure and a computer.

GX301-2 Data

Part I - The Method

Our procedure will be to perform an epoch fold and then run a chi

test. So we need to understand what chi

is and how it can be used... Welcome to the chi

Test!

Whenever you try to fit data by an equation, you ultimately need to know if the equation you have chosen is a "good fit" to your data. In order to do this, you need some measure of the "goodness of fit" which will allow you to determine quantitatively if the fit is acceptable or not. In general, this measure is built upon the idea that a "good fit" of an equation to data results in the minimization of the weighted sum of square of deviations between the fitted value and the measured value. chi is a convenient measure of the goodness of fit of an equation (any equation) to a set of measured data. To be specific, it is the weighted sum of the squared differences between the measured and calculated values.

The variance of the fit is defined by the statistic chi , which can be written

where

is the variance or the square of the calculated error of each point, y_i is the measured value of y at a given point, y(x_i) is the fitted value of y at that given point.

If the fitted values of y(x_i) are good approximations of the measure values y_i, then the value of chi is low and a good fit can be claimed. If, however, the value of chi is high, the fit is not good. (We will discuss what is meant by "low" and "high" a little later.)

Note, however, that astronomers use chi in a very different way than statisticians. Statisticians look for a small chi value, where astronomers look for a large chi . Why is this? We can not have any idea of what the period is, what the shape of the pulse is, or any other information about the modulation before we run the test looking for the periodic behavior. What we do know is that if the peri od we are testing is incorrect, the result of our epoch folding is something close to a flat line. So, we use a flat line (representing the average value across all the measured data) as our fitting equation. Then, if chi is small, we know that the data are well represented by a flat line and no modulation exists at that period. Only if chi is large is there some possibility that a periodic modulation at the tested period exists in the data.

Chi Squared

Understanding what chi is, let us now think again about our procedure -- In order to determine if periodic behavior occurs within a set of data, and the period at which it occurs, we run the entire data set through epoch folding at a given period and then calculate the chi for the resulting fold. This procedure is repeated for all the different periods we want to test.

A graph can then be generated showing the chi calculated for each tested period. Astronomers are then interested in looking at the data folded on the periods with large chi values.

Examine the graph below. What period do you think an astronomer would want to pursue?

GX301-2

There is a clear peak in chi at 41 days (the other smaller peaks at fractional values of 41 days are called aliases, which we won't go into here). Now fold the data back on that given period of 41 days and see what the average light curve looks like. Is it some anomalous data point that is leading the chi test awry? No. The result is a clear, smooth periodic behavior at a period of 41 days. We have, in fact, found the orbital period of this binary system. It takes 41 days for the neutron star to revolve once around its supergiant main sequence companion.

GX301-2

Lastly, note that if we in fact found the correct period, the source would exhibit it at all times. Thus, if we did not use our entire data set, but looked at small subsets of it, we should still find the modulation at 41 days in each subset. We did thi s procedure to the GX301-2 data and it clearly shows the same sort of behavior in each of the 10 one-year subsets of data we examined.

GX301-2

Calculating

It is not so difficult to create a computer code which calculates the reduced chi

for a fit to any given set of data. AP Physics or Pre-Calculus students taking computer programming classes may consider doing so.

Tell me about creating computer code for chi-square !

Part II - NOW YOU be the scientist...

Once you have your computer program ready to calculate chi

, can you determine if there is any periodic behavior in the data set below? If so, what is the period? If not, what are the constraints your analysis allows you to put on the lack of periodic behavior, e.g. what range of periods could you test?

To get a copy of the DATA SET you need, you must visit our Web site at http://imagine.gsfc.nasa.gov/docs/teachers/lessons/time/enrichment_data.html or email us.

Imagine the Universe is a service of the High Energy Astrophysics Science Archive Research Center (HEASARC), Dr. Nicholas White (Director), within the Laboratory for High Energy Astrophysics at NASA's Goddard Space Flight Center.

The Imagine Team
Project Leader: Dr. Jim Lochner
All material on this site has been created and updated between 1997-2004.